Introduction to Multiomics Data Integration and Visualisation
Introduction to Multiomics Data Integration and Visualisation
Day 1: Introduction
Research in aging
Aging is a time-dependent functional decline that affects most living organisms. It is characterized by a progressive loss of physiological integrity, leading to impaired function and increased vulnerability to death.
The accumulation of cellular damage is widely considered to be the general cause of aging (López-Otı́n et al. 2013), which explains why the studies from the last three decades have mostly focused on the cellular aspects of aging.
Currently, the aging hallmarks are:
- Genomic instability;
- Telomere attrition;
- Epigenetic alterations;
- Loss of proteostasis;
- Deregulated nutrient sensing;
- Mitochondrial dysfunction;
- Cellular senescence;
- Stem cell exhaustion;
- Altered intercellular communication.
However, aging should not only be seen as a cellular disfunction but as an organismal process. All the cells within an organism are undergoing aging, and not all of them are aging simultaneously. Overall, different tissues are aging differently (Son et al. 2019). Thus, aging is a decline affecting not only cellular components but also organismal biological processes in general.
Aging and the worm
Aging research was inaugurated more than 30 years ago, following the isolation of the first long-lived strains in C. elegans (Johnson 2013). In the past three decades, the nematode played a fundamental role in aging research. It is a powerful model organism when it comes to studying aging:
- It has a short life cycle (~ 21 days in normal conditions);
- Mutants can be easily generated and phenotypically screened for aging defects;
- Longevity assays are easy to perform and very informative;
- Its genome has been fully sequenced and presents fundamental resemblances with that of more complex metazoans.
One of the most used phenotypic assays in worm is the longevity assay. Longevity assays are useful to understand whether a gene is involved in aging (Chen et al. 2015). RNAi is used to knock-down a gene of interest and the resulting longevity is measured. If the longevity increases, the gene function normally reduce it.
In worms, RNAi can be used in large-scale screens. RNAi screens have been performed and have led to the identification of tens of genes involved in aging and regulation of longevity in general. It has been instrumental to decipher the molecular pathways involved in aging (Hamilton et al. 2005).
The question
If RNAi has been instrumental to generally study the process of aging in worm, the readouts are generally phenotypic readouts, e.g. does the worm live longer? Or does it rapidly stop moving after knock-down? These readouts are poorly suited to understand the aging-related role of a protein coded by a given gene in specific tissues. For instance, slo-1 is a gene coding for a potassium channel. Its mutation or knock-down accelerates aging (Chen et al. 2013). Because we have some information about the nature and function of the coded protein, it is rather easy to speculate in each tissue it has an aging-related function (hint: the neurons!).
However, if we do not know anything about the gene (e.g. in which tissue it is expressed, which transcription factors regulate its expression, the type of protein encoded by it, …), it becomes challenging to draw conclusions on the molecular mechanisms involving the protein encoded by the gene.
Rather than relying on RNAi screens, one could leverage the large amount of “omics” data generated in C. elegans to better characterize aging processes at the organismal level. The goal of this study is to analyze aging-related perturbations in worm in a comprehensive manner to (1) identify the tissues undergoing important remodeling during aging and (2) understand the molecular networks involved in aging. More specifically, we would like to fulfill the following aims:
- Find tissue-specific proteins and genes deregulated during aging;
- Find their associations with known diseases;
- Find potential targets for drug engineering
The approach
RNAi has been extensively used to study aging in worm. However, next-generation techniques such as high-throughput sequencing or mass spectrometry now allow for more comprehensive multiomics studies of biological processes. We will rely on results from these techniques to:
- Day 2: Annotate sets of tissue-specific proteins (using proteomics results) and compare it with sets of tissue-specific genes (using transcriptomics results);
- Day 3: Find genomic regions regulating these genes (using epigenomics results);
- Day 4: Identify networks of transcription factors regulating these genes (using epigenomics results).
About this vignette
Over the next three days, we will be conducting analysis of several types of datasets in R. This vignette is to be used as a guideline and lists important questions or technical aspects. The R code is only here to suggest one solution to the question(s) raised. Before looking at it, it is recommended to investigate the data by yourself and try out your own commands.
Day 2: Proteomics
Today we will be focusing on analyzing the proteomics data generated in Reinke et al. (2017): “In vivo mapping of tissue- and subcellular-specific proteomes in Caenorhabditis elegans”. Reading the abstract of the paper is recommended to understand what the authors have attempted to do. In short, the authors have used a protein proximity labelling technique to label cytoplasmic or nuclear proteins in four different tissues of the worm: intestine, epidermis, body wall muscle and pharyngeal muscle. Proteins enriched in each subcellular location or tissue were then identified by mass spectrometry.
Set up environment
You will first need to make sure that you are working in the right folder. You can also load important packages at this point.
Load and inspect the data
The data has been fetched from the original paper and reformatted for the purpose of the course. Start by loading it from data/1602426_TableS4.csv.
Questions:
- How many proteins are studied?
- How many are in each tissue?
proteomics <- read.csv(
file.path(PROJECT_PATH, '../data/1602426_TableS4_processed.csv'),
na.strings = "NA"
)
# Number of proteins in the file
nrow(proteomics)
## [1] 3722
# Number of proteins detected in each tissue
colSums(proteomics[,38:45])
## Epidermis_cytoplasm Epidermis_nucleus
## 2442 1278
## Pharyngeal.muscle_cytoplasm Pharyngeal.muscle_nucleus
## 837 104
## Body.wall.muscle_cytoplasm Body.wall.muscle_nucleus
## 1375 609
## Intestine_cytoplasm Intestine_nucleus
## 2449 971Adding gene information
This file only contains the protein names. Eventually, we will be looking at genes as well. The simplest approach is to convert everything to unique gene IDs. For this, we will rely on BiomaRt.
Questions:
- What is the difference between Swissprot and Trembl?
- Why is there less annotations from SwissProt?
- Why does the original dataset contain both types of IDs?
- Why are there some protein IDs that do not match any gene ID?
- Are there proteins coming from the same gene?
# You can get IDs of genes and associated proteins using biomaRt
### ensembl <- useDataset("celegans_gene_ensembl", mart = useMart("ensembl"))
### ids <- getBM(
# attributes = c("uniprotswissprot", "uniprotsptrembl", "wormbase_gene"),
# mart = ensembl
# )
# Otherwise, you can also get the ids table from the data folder:
ids <- read.table('../data/proteins_genes_IDs.txt', header = TRUE, sep = '\t')
# Merging SwissProt and Trembl IDs into one column will make your life easier
ids$protID <- paste0(ids$uniprotswissprot, ids$uniprotsptrembl)
# Then a simple match function can be used to translate the protein IDs into gene IDs
proteomics$WormBaseID <- ids$wormbase_gene[match(proteomics$UniProtKB, ids$protID)]
# Few proteins do not have any gene ID (why?). We will ignore these ones for the rest of the study.
proteomics <- proteomics[!is.na(proteomics$WormBaseID),]Separating into lists of tissue-specific proteins
Let’s figure out which proteins are specifically present in each tissue.
Questions:
- How many proteins are present specifically in one tissue?
- For each set of proteins, are they more cytoplasmic or nuclear-enriched?
- Which ones are transcription factors? (Hint: there is a list of all transcription factors annotated in C. elegans in the data folder…)
- Can you say anything about the tissue-specific TFs?
- What do you think about this? Are transcription factors enriched in the sets of tissue-specific proteins?
# Let's retrieve the list of proteins specifically enriched in a single tissue
list_prots <- lapply(
levels(proteomics$Tissue.specific),
function(TISSUE) {
proteomics$UniProtKB %>%
'['(proteomics$Tissue.specific == TISSUE & !is.na(proteomics$Tissue.specific)) %>%
as.character()
}
) %>% setNames(levels(proteomics$Tissue.specific))
lengths(list_prots)
## Body wall muscle specific Epidermis specific
## 55 125
## Intestine specific Pharyngeal muscle specific
## 130 21
# Because we will need it later, you can also get the same list but with gene IDs
list_genes <- lapply(
levels(proteomics$Tissue.specific),
function(TISSUE) {
proteomics$WormBaseID %>%
'['(proteomics$Tissue.specific == TISSUE & !is.na(proteomics$Tissue.specific)) %>%
as.character()
}
) %>% setNames(levels(proteomics$Tissue.specific))
# A list of most of the transcription factors found in C. elegans genome can be found in data/TFs.txt
tfs <- readLines('../data/TFs.txt')
length(tfs)
## [1] 824
# Let's see which proteins enriched in tissues are transcription factors
tfs_in_tissue_specific_prots <- lapply(list_prots, function(prots) {
prots[proteomics$WormBaseID[match(prots, proteomics$UniProtKB)] %in% tfs]
})
# Let's see if transcription factors are enriched
n_prots <- nrow(ids)
n_tfs <- length(tfs)
n_tissue_spe_prots <- sum(lengths(list_prots))
n_tfs_in_tissue_specific_prots <- sum(lengths(tfs_in_tissue_specific_prots))
contingency_matrix <- matrix(
cbind(
c(n_tfs_in_tissue_specific_prots, n_tfs - n_tfs_in_tissue_specific_prots),
c(n_tissue_spe_prots - n_tfs_in_tissue_specific_prots, n_prots - n_tfs - n_tissue_spe_prots + n_tfs_in_tissue_specific_prots)
),
nrow = 2
)
fisher.test(contingency_matrix)
##
## Fisher's Exact Test for Count Data
##
## data: contingency_matrix
## p-value = 0.6
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3354 1.5753
## sample estimates:
## odds ratio
## 0.7862Compare it to existing tissue-specific datasets from RNA-seq
Let’s see if this set of proteins overlap with tissue-specific gene annotation. For this, we can use the data in data/gene_annotations.gff3. This is a .gff3 file, a format used to add specific information to gene annotations. This annotation file has not been published yet.
Questions:
- Explain the fundamental differences between the 2 sets of data?
- Are the experiments from the same developmental stage?
- Are the two datasets consistent with to each other?
- What are some good ways to represent the intersection between these datasets?
- Can you comment on the list of proteins specifically present in pharyngeal muscles, when compared to the gene annotations from transcriptomics?
# Let's import the gene annotation information
genes <- import('../data/gene_annotations.gff3')
names(genes) <- genes$ID
# It will be easier if we convert the tissue-specific gene annotation to a vector
genes$which.tissues <- factor(genes$which.tissues, levels = c(
'Germline', 'Sperm', 'EarlyGermline', 'Neurons', 'Muscle', 'Hypod.', 'Intest.',
'Germline_Neurons', 'Germline_Muscle', 'Germline_Hypod.', 'Germline_Intest.', 'Neurons_Muscle', 'Neurons_Hypod.', 'Neurons_Intest.', 'Muscle_Hypod.', 'Muscle_Intest.', 'Hypod._Intest.',
'Germline_Neurons_Muscle', 'Germline_Neurons_Hypod.', 'Germline_Neurons_Intest.', 'Germline_Muscle_Hypod.', 'Germline_Muscle_Intest.', 'Germline_Hypod._Intest.', 'Neurons_Muscle_Hypod.', 'Neurons_Muscle_Intest.', 'Neurons_Hypod._Intest.', 'Muscle_Hypod._Intest.',
'Germline_Neurons_Muscle_Hypod.', 'Germline_Neurons_Muscle_Intest.', 'Germline_Neurons_Hypod._Intest.', 'Germline_Muscle_Hypod._Intest.', 'Neurons_Muscle_Hypod._Intest.',
'Germline_Neurons_Muscle_Hypod._Intest.',
'Soma', 'Ubiq.', 'Ubiq.-Reg', 'Unclassified', 'Low', 'non-prot-cod'
))
# We can compare the two different tissue-specific annotations
lapply(list_genes, function(g) {
table(genes[g]$which.tissues)
})
## $`Body wall muscle specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 34
## Hypod.
## 0
## Intest.
## 0
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 3
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 2
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 3
## Ubiq.
## 4
## Ubiq.-Reg
## 0
## Unclassified
## 4
## Low
## 0
## non-prot-cod
## 0
##
## $`Epidermis specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 0
## Hypod.
## 41
## Intest.
## 1
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 1
## Germline_Intest.
## 0
## Neurons_Muscle
## 0
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 1
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 1
## Neurons_Muscle_Intest.
## 0
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 5
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 7
## Ubiq.
## 19
## Ubiq.-Reg
## 0
## Unclassified
## 10
## Low
## 4
## non-prot-cod
## 0
##
## $`Intestine specific`
##
## Germline
## 2
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 1
## Muscle
## 0
## Hypod.
## 2
## Intest.
## 39
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 0
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 2
## Hypod._Intest.
## 2
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 11
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 1
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 5
## Ubiq.
## 15
## Ubiq.-Reg
## 0
## Unclassified
## 18
## Low
## 3
## non-prot-cod
## 0
##
## $`Pharyngeal muscle specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 2
## Hypod.
## 0
## Intest.
## 0
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 1
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 1
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 3
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 1
## Ubiq.
## 0
## Ubiq.-Reg
## 0
## Unclassified
## 4
## Low
## 7
## non-prot-cod
## 0
# We can also plot something
df <- data.frame(
transcriptome_annotations = genes[unlist(list_genes)]$which.tissues,
proteomics_annotations = proteomics[match(unlist(list_genes), proteomics$WormBaseID),]$Tissue.specific
) %>%
table() %>%
as.data.frame()
p <- ggplot(df, aes(y = 1, x = transcriptome_annotations)) +
geom_tile(aes(fill = Freq)) +
geom_text(aes(label = Freq)) +
scale_fill_gradientn(colours = c('white', 'orange', 'red')) +
theme_bw() +
facet_wrap(~proteomics_annotations) +
labs(y = '', x = 'Transcriptomics-based annotations') +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
pSave your work!!
That’s it for today! Don’t forget to save your progress.
Day 3: ATAC-seq
Today you will be focusing on investigating the ATAC-seq data generated in Jänes et al. (2018): “Chromatin accessibility dynamics across C. elegans development and ageing”.
Set up environment
Once again, first make sure that you are working in the right folder. You can also load important packages at this point.
Load previous work
The work done in Day 2 has been stored in an RData object.
Get the ATAC-seq aging data
The data has been fetched from the original paper and reformatted for the purpose of the course. Start by loading it from data/ATAC_aging.gff3
Questions:
- Which format is the ATAC_aging.gff3 file? What does the file contain?
- Look at the cluster information: can you visualize variations of promoter accessibility during aging?
- Use the Seqplots software to get more insights about the clusters
- In which tissue(s) are the associated genes transcribed & translated?
REs <- import('../data/ATAC-seq_aging.gff3')
REs <- REs[REs$ageing_prom_cluster_label != '.']
ATAC_aging <- mcols(REs)[, 8:12] %>%
data.frame() %>%
data.matrix() %>%
as.data.frame()
# Visualization in R
df <- data.frame(
locus = REs$locus,
apply(ATAC_aging, 1, function(ROW) {log2(ROW/mean(ROW))}) %>% t() %>% data.frame(),
cluster = REs$ageing_prom_cluster_label
) %>%
gather(stage, expr, -cluster, -locus) %>%
mutate(stage = factor(stage, levels = c('FPM_ATAC_aYA', 'FPM_ATAC_aD3', 'FPM_ATAC_aD7', 'FPM_ATAC_aD10', 'FPM_ATAC_aD14'))) %>%
filter(cluster != '.')
p <- ggplot(df, aes(x = stage, y = expr, color = cluster, group = locus)) +
geom_line(alpha = 0.05) +
theme_bw() +
facet_wrap(~cluster) +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
theme(legend.position = 'none')
p
## Warning: Removed 30 rows containing missing values (geom_path). # Export bed files of promoters from each cluster and visualize the accessibility profiles over each set using SeqPlots
for (cluster in levels(df$cluster)) {
export(granges(REs[REs$ageing_prom_cluster_label == cluster]), paste0('ATAC_cluster-', cluster, '.bed'))
}
# Check the association between REs and proteins
list_genes_aging <- lapply(levels(df$cluster), function(cluster) {
proms <- REs[REs$ageing_prom_cluster_label == cluster & REs$regulatory_class %in% c('bidirect-promoter', 'fwd-promoter', 'rev-promoter')]
genes_cluster <- proms$WormBaseID %>% as.character() %>% strsplit(',') %>% unlist()
return(genes_cluster)
}) %>% setNames(levels(df$cluster))
# We can also plot something
df <- data.frame(
cluster_annotations = unlist(lapply(seq_along(list_genes_aging), function(K) {rep(names(list_genes_aging)[K], length(list_genes_aging[[K]]))})),
proteomics_annotations = proteomics[match(unlist(list_genes_aging), proteomics$WormBaseID),]$Tissue.specific
) %>%
table() %>%
as.data.frame() %>%
filter(cluster_annotations != 'stable')
p <- ggplot(df, aes(y = 1, x = cluster_annotations)) +
geom_tile(aes(fill = Freq)) +
geom_text(aes(label = Freq)) +
scale_fill_gradientn(colours = c('white', 'orange', 'red')) +
theme_bw() +
facet_wrap(~proteomics_annotations) +
labs(y = '', x = 'Ageing cluster') +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
pInvestigate the genes associated to varying promoters during aging
Questions:
- What is the function of genes going up (down) during aging?
- What are their counterpart in mammals?
- What are the diseases associated to these genes?
- Comment on the rationale of the analysis: we are looking at genes associated to promoters de-regulated during aging…
# GO enrichment analysis
genes_down <- unique(unlist(list_genes_aging[2:5]))
genes_up <- unique(unlist(list_genes_aging[6:9]))
gos <- gost(
list(
up = genes_up,
down = genes_down
),
organism = 'celegans',
multi_query = TRUE,
user_threshold = 0.01,
source = c('GO:BP', 'GO:MF', 'GO:CC')
)
p <- gostplot(gos, capped = TRUE, interactive = TRUE)
saveWidget(p, 'GOs.html')
# Homo sapiens homologs
ensembl <- useDataset("celegans_gene_ensembl", mart = useMart("ensembl"))
human_homologues_down <- getBM(
filters = c("ensembl_gene_id"),
values = genes_down,
attributes = c('ensembl_gene_id', 'hsapiens_homolog_ensembl_gene'),
mart = ensembl
)
human_homologues_up <- getBM(
filters = c("ensembl_gene_id"),
values = genes_up,
attributes = c('ensembl_gene_id', 'hsapiens_homolog_ensembl_gene'),
mart = ensembl
)
# DOSE
human_homologues_down_ENTREZ <- bitr(
human_homologues_down$hsapiens_homolog_ensembl_gene %>% '['(nchar(.) > 0),
fromType = "ENSEMBL",
toType = c("ENTREZID"),
OrgDb = org.Hs.eg.db
)$ENTREZID
diseases_enrichment_down <- enrichDO(
gene = human_homologues_down_ENTREZ,
ont = "DO",
pvalueCutoff = 0.01,
pAdjustMethod = "BH",
readable = TRUE
)Day 4: ChIP-seq
Today you will be focusing on investigating ChIP-seq data generated by the modENCODE consortium, to understand which TFs are regulating the promoters which are varying during aging
Set up environment
Once again, you need to make sure that you are working in the right folder. Important packages will also be loaded at this point.
Load previous work
The work done in Day 2 and Day 3 has been stored in an RData object.
Get the ChIP-seq data
Questions:
- What is modENCODE?
- How to download all the data from modENCODE?
- What useful format can be used here to easily get information on TF binding profiles? Is .bed format better than .bw format for our purpose? Why?
Investigate the TFs associated to varying promoters during aging
Questions:
- How to summarize all the modENCODE data?
- Which metric should be calculated to see if a TF is enriched in a set of promoters?
- How can you represent the importance of all TFs in each cluster?
- Can you define binding motifs for the TFs enriched in some clusters?
Defining key TFs involved in aging?
Questions:
- Is there a group of motifs functionally interacting?
- What can you find in the literature about these factors?
SessionInfo
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] org.Hs.eg.db_3.7.0 AnnotationDbi_1.44.0 Biobase_2.42.0
## [4] clusterProfiler_3.10.1 DOSE_3.8.2 gprofiler2_0.1.3
## [7] htmlwidgets_1.3 biomaRt_2.38.0 rtracklayer_1.42.2
## [10] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
## [13] S4Vectors_0.20.1 BiocGenerics_0.28.0 forcats_0.4.0
## [16] stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.2
## [19] readr_1.3.1 tidyr_0.8.3 tibble_2.1.1
## [22] ggplot2_3.1.1 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] fgsea_1.13.0 colorspace_1.4-1
## [3] ggridges_0.5.1 qvalue_2.14.1
## [5] XVector_0.22.0 rstudioapi_0.10
## [7] farver_1.1.0 urltools_1.7.2
## [9] ggrepel_0.8.0 bit64_0.9-7
## [11] lubridate_1.7.4 xml2_1.2.0
## [13] splines_3.5.2 GOSemSim_2.8.0
## [15] knitr_1.22 polyclip_1.10-0
## [17] jsonlite_1.6 Rsamtools_1.34.1
## [19] broom_0.5.1 GO.db_3.7.0
## [21] ggforce_0.2.1 compiler_3.5.2
## [23] httr_1.4.0 rvcheck_0.1.3
## [25] backports_1.1.3 assertthat_0.2.1
## [27] Matrix_1.2-17 lazyeval_0.2.2
## [29] cli_1.1.0 tweenr_1.0.1
## [31] htmltools_0.3.6 prettyunits_1.0.2
## [33] tools_3.5.2 igraph_1.2.4
## [35] gtable_0.3.0 glue_1.3.1
## [37] GenomeInfoDbData_1.2.0 reshape2_1.4.3
## [39] DO.db_2.9 fastmatch_1.1-0
## [41] Rcpp_1.0.1 enrichplot_1.2.0
## [43] cellranger_1.1.0 Biostrings_2.50.2
## [45] nlme_3.1-137 ggraph_1.0.2
## [47] xfun_0.5 rvest_0.3.2
## [49] XML_3.98-1.19 europepmc_0.3
## [51] zlibbioc_1.28.0 MASS_7.3-51.1
## [53] scales_1.0.0 hms_0.4.2
## [55] SummarizedExperiment_1.12.0 RColorBrewer_1.1-2
## [57] yaml_2.2.0 memoise_1.1.0
## [59] gridExtra_2.3 UpSetR_1.3.3
## [61] triebeard_0.3.0 stringi_1.3.1
## [63] RSQLite_2.1.1 BiocParallel_1.16.6
## [65] rlang_0.4.2 pkgconfig_2.0.2
## [67] matrixStats_0.54.0 bitops_1.0-6
## [69] evaluate_0.13 lattice_0.20-38
## [71] GenomicAlignments_1.18.1 labeling_0.3
## [73] cowplot_0.9.4 bit_1.1-14
## [75] tidyselect_0.2.5 plyr_1.8.4
## [77] magrittr_1.5 bookdown_0.9.2
## [79] R6_2.4.0 generics_0.0.2
## [81] DelayedArray_0.8.0 DBI_1.0.0
## [83] pillar_1.3.1 haven_2.1.0
## [85] withr_2.1.2 RCurl_1.95-4.12
## [87] modelr_0.1.4 crayon_1.3.4
## [89] plotly_4.8.0 rmarkdown_1.12.6
## [91] viridis_0.5.1 progress_1.2.0
## [93] grid_3.5.2 readxl_1.3.1
## [95] data.table_1.12.0 blob_1.1.1
## [97] rmdformats_0.3.6 digest_0.6.18
## [99] gridGraphics_0.3-0 munsell_0.5.0
## [101] ggplotify_0.0.3 viridisLite_0.3.0
References
Chen, Albert Tzong-Yang, Chunfang Guo, Omar A. Itani, Breane G. Budaitis, Travis W. Williams, Christopher E. Hopkins, Richard C. McEachin, et al. 2015. “Longevity Genes Revealed by Integrative Analysis of Isoform-Specific daf-16/FoxO Mutants of Caenorhabditis elegans.” Genetics 201 (2): 613. https://doi.org/10.1534/genetics.115.177998.
Chen, Chun-Hao, Yen-Chih Chen, Hao-Ching Jiang, Chung-Kuan Chen, and Chun-Liang Pan. 2013. “Neuronal aging: learning from C. elegans.” Journal of Molecular Signaling 8 (0). https://doi.org/10.1186/1750-2187-8-14.
Hamilton, Benjamin, Yuqing Dong, Mami Shindo, Wenyu Liu, Ian Odell, Gary Ruvkun, and Siu Sylvia Lee. 2005. “A systematic RNAi screen for longevity genes in C. elegans.” Genes & Development 19 (13): 1544–55. https://doi.org/10.1101/gad.1308205.
Jänes, Jürgen, Yan Dong, Michael Schoof, Jacques Serizay, Alex Appert, Chiara Cerrato, Carson Woodbury, et al. 2018. “Chromatin accessibility dynamics across C. elegans development and ageing.” eLife, October. https://doi.org/10.7554/eLife.37344.
Johnson, Thomas E. 2013. “25 Years after age-1: Genes, Interventions and the Revolution in Aging Research.” Experimental Gerontology 48 (7): 640. https://doi.org/10.1016/j.exger.2013.02.023.
López-Otı́n, Carlos, Maria A. Blasco, Linda Partridge, Manuel Serrano, and Guido Kroemer. 2013. “The Hallmarks of Aging.” Cell 153 (6): 1194–1217. https://doi.org/10.1016/j.cell.2013.05.039.
Reinke, Aaron W, Raymond Mak, Emily R Troemel, and Eric J Ben. 2017. “In Vivo Mapping of Tissue- and Subcellular-Specific Proteomes in Caenorhabditis Elegans.” Science Advances 3 (5): 1–12.
Son, Heehwa G., Ozlem Altintas, Eun Ji E. Kim, Sujeong Kwon, and Seung-Jae V. Lee. 2019. “Age-dependent changes and biomarkers of aging in Caenorhabditis elegans.” Aging Cell 18 (2): e12853. https://doi.org/10.1111/acel.12853.